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Abstract 

The  AFRL/Cornell  Information  Assurance  Institute  supports  a 
broad  spectrum  of  research  aimed  at  developing  a  science  and  technol¬ 
ogy  base  to  enhance  information  assurance  and  networked  information 
systems  trustworthiness — system  and  network  security,  reliability,  and 
assurance.  The  institute  also  fosters  closer  collaborations  between  Cor¬ 
nell  and  AFRL  researchers,  as  well  as  facilitating  technology  transfer 
and  exposing  Cornell  researchers  to  problems  facing  the  Air  Force. 
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Introduction 


The  AFRL/Cornell  Information  Assurance  Institute  (IAI)  was  established 
at  Cornell  by  an  initial  grant  from  AFOSR  in  March  2000.  IAI  was  created 
as  a  prototype  for  a  new  mode  of  funding.  And,  after  2  years,  the  results  of 
this  experiment  confirm  that  this  new  funding  mode-granting  research  funds 
to  a  University  center  having  close  geographic  and  intellectual  proximity  to, 
but  loose  affiliation  with,  an  AFRL  laboratory — has  enormous  leverage: 

•  IAI  funding  has  enabled  some  absolutely  first-rate  Computer  Science 
research  to  be  performed  at  Cornell. 

•  IAI  funding  has  facilitated  interactions  between  Cornell  researchers 
and  AFRL  staff,  with  research  at  Cornell  now  having  clear  relevance 
to  the  research  needs  of  the  Air  Force. 

Specific  research  accomplishments  supported  under  the  auspices  of  IAI  are 
summarized  below  (§2);  details  can  be  found  in  the  publications  listed  at 
the  end  of  this  report  (§4).  Technology  transitions  and  DoD  interactions  are 
also  discussed  (§3).  Figure  1  lists  those  researchers  at  Cornell  (along  with 
their  specializations)  whose  work  has  been  supported,  in  part,  by  IAI. 

2  Summary  of  Research  Accomplishments 

Scalable  Fault-tolerant  Systems  (Birman,  van  Renesse).  This  ef- 
fort  has  focused  on  scalability  of  the  publish-subscribe  paradigm  and  has 
interacted  extensively  with  Rome/AFRL  researchers  to  understand  specific 
issues  arising  from  application  of  the  publish-subscribe  paradigm  within  the 
JBI  effort  and  within  other  related  military  systems.  The  most  significant 
accomplishments  include  the  development  of  (i)  the  Astrolabe  scalable  mon¬ 
itoring  and  management  framework  and  (ii)  the  Astrocast  publish-subscribe 
structure  based  on  a  novel  Bimodal  Multicast.  Together,  these  bring  a  new 
and  remarkably  flexible  way  of  implementing  publish-subscribe  services  with 
good  scaling  properties,  stability  under  stress,  and  a  high  quality  of  security. 

Program  Refinement  Logic  (Constable,  Kreitz),  Program  Refine¬ 
ment  Logic  (PRL)  is  a  logical  programming  environment  that  provides  sub¬ 
stantial  automation  in  the  design,  coding,  verification,  and  evolution  of  large 
software  systems.  It  is  based  on  the  latest  version  of  Nuprl  proof  develop¬ 
ment  system.  A  first  prototype  of  a  formal  digital  library  of  algorithmic 
knowledge  (FDL)  has  been  completed.  FDL  provides  an  infrastructure  for 
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Kenneth  Birman: 


Robert  L.  Constable: 
Alan  Demers: 
Johannes  Gehrke: 


Distributed  computing,  fault-tolerant  net¬ 
work  systems,  distributed  systems  security, 
large-scale  network  applications. 

Applied  logic,  automated  reasoning,  software 
assurance. 

Database  systems,  database  replication,  and 
algorithms. 

Database  systems  and  data  mining. 


Joseph  Y.  Halpern: 
Dexter  Kozen: 
Christoph  Kreitz: 

J.  Gregory  Morrisett: 
Andrew  Myers: 
Robbert  Van  Renesse: 

Fred.  B.  Schneider: 

Emin  Gun  Sirer: 


Reasoning  about  knowledge  and  uncertainty, 
distributed  computing,  security. 

Proof  carrying  code,  program  logics,  and  se¬ 
mantics. 

Applied  logic,  automated  reasoning,  software 
assurance. 

Programming  languages,  compilers,  dis¬ 
tributed  systems,  language-based  security. 

Programming  languages,  security,  mobile 
code. 

Distributed  computing,  fault-tolerant  net¬ 
work  systems,  distributed  systems  security, 
large-scale  network  applications. 

Distributed  systems  security  and  fault- 
tolerance,  mobile  code,  concurrent  program¬ 
ming. 

Secure  distributed  systems,  extensible  oper¬ 
ating  systems,  language-based  security,  auto¬ 
mated  testing. 


Jayavel  Shanmugasundaram:  Internet  data  management,  database  systems, 

and  query-processing  in  emerging  system  ar¬ 
chitectures. 


Figure  1:  IAI  Staff  and  Research  Interests 
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verifying  and  synthesizing  software  systems  by  supporting  the  creation  of 
certified  algorithmic  knowledge,  the  cooperation  of  multiple  theorem  prov¬ 
ing  systems,  and  flexible  yet  controlled  access  to  the  archived  knowledge. 
Users  may  contribute  library  contents  using  the  Nuprl,  MetaPRL,  JProver, 
and  PVS  theorem  provers. 

FDL  has  been  used  to  support: 

•  code  transformations  that  improve  performance  and  enable  protocols 
to  be  made  adaptive  while  preserving  functionality  in  connection  with 
a  self-adaptive  task  allocation  manager  to  control  processing  of  real¬ 
time  media  over  a  network  through  coordinated  local  schedules, 

•  the  creation  of  formal  courseware,  and 

•  the  translation  of  formal  proofs  into  natural  language. 

Databases  and  Data  Mining  (Gehrke).  The  Cougar  Project  has  pro¬ 
duced  database  technology  to  support  distributed  wireless  sensor  networks 
with  millions  of  nodes.  Here,  novel  distributed  query  processing  strate¬ 
gies  for  long-running  queries  permit  in-network  aggregation  and  can  trade 
communication  for  local  computation,  increasing  the  lifetime  of  the  net¬ 
work  by  up  to  an  order  of  magnitude.  A  first  version  of  the  system  was 
demonstrated  at  29  Palms  in  California  (Fall  2001)  and  the  ACM  Sigmod 
Conference  (2002). 

The  Himalaya  Project  has  created  some  of  the  world’s  fastest  data  min¬ 
ing  algorithms  for  mining  long  itemsets,  classification  tree  construction,  and 
also  regression-tree  construction  and  sequence  mining.  Other  work  focused 
on  pushing  user-defined  constraints  (such  as  defined  by  intrusion-detection 
systems)  deep  into  the  mining  algorithm,  in  order  to  improve  performance 
by  orders  of  magnitude  versus  simple  a-priori  model  construction  and  a- 
posteriori  model  pruning  via  constraints.  This  project  also  investigated 
privacy-preserving  data  mining  algorithms,  where  datasets  can  be  shared 
publicly  without  compromising  values  of  individual  records  while  at  the 
same  time  ensuring  that  accurate  statistical  summary  information  can  still 
be  recovered. 

Formalizing  Security  (Halpern).  The  use  of  modal  logic  has  led  to  a 
new  formalization  of  secrecy.  This  formalization  is  consistent  with  Suther¬ 
land’s  notion  of  nondeducibility,  subsumes  both  separability  generalized 
non-interference  and  nondeducibility  on  strategies,  and  is  able  to  handle 
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probabilistic  secrecy,  resource-bounded  reasoning,  as  well  as  downgrading  of 
information. 

In  addition,  a  new  first-order  logic  was  developed  for  reasoning  about 
security  policies.  The  formalism  is  a  fragment  of  first-order  logic  that  can 
both  express  many  policies  of  interest  and  is  tractable.  Based  on  the  logic,  a 
prototype  reasoning  engine  has  been  designed.  Its  user  interface  is  intended 
for  non-logicians,  allowing  them  to  enter  policies,  facts  about  principals,  and 
then  to  ask  questions  about  the  policies. 

Avoiding  Malicious  Boot  Firmware  (Kozen).  In  collaboration  with 
Architecture  Technologies  Corporation  (Ithaca,  NY)  and  CodeGen  Inc.  (Palo 
Alto,  C  A)  a  prototype  certifying  compiler  and  verifier  for  detecting  malicious 
boot  firmware  has  been  developed.  Boot  firmware  modules  are  automati¬ 
cally  verified  against  a  standard  security  policy,  as  they  are  loaded.  Among 
other  things,  the  security  policy  being  enforced  asserts  that  drivers  must  ac¬ 
cess  other  devices  only  through  a  strict  interface  and  may  not  access  memory 
or  bus  addresses  not  allocated  to  them.  Efficient  Code  Certification,  along 
with  inexpensive  static  checks  on  the  compiled  code,  suffice  to  guarantee 
dynamic  properties  of  the  program  at  run  time.  The  prototype  is  compli¬ 
ant  with  the  now  widely  used  IEEE  1275  Open  Firmware  standard  for  boot 
firmware.  Sample  device  drivers  written  in  Java  for  a  block-oriented  storage 
device  and  a  PCI  bus  have  been  successfully  compiled. 

Cyclone  Compiler  (Morrisett).  Cyclone  is  type-safe  programming  lan¬ 
guage  that  can  be  roughly  characterized  as  a  “superset  of  a  subset  of  C.” 
The  type  system  of  Cyclone  accepts  many  C  functions  without  change  and 
uses  the  same  data  representations  and  calling  conventions  as  C  for  a  given 
type  constructor.  It  also  rejects  many  C  programs  to  ensure  safety.  For 
instance,  it  rejects  programs  that  perform  (potentially)  unsafe  casts,  that 
use  unions  of  incompatible  types,  that  (might)  fail  to  initialize  a  location 
before  using  it,  that  use  certain  forms  of  pointer  arithmetic,  or  that  attempt 
to  do  certain  forms  of  memory  management.  All  of  the  analyses  used  by 
Cyclone  are  local  (i.e.,  intra-procedural)  to  enable  scalability  and  separate 
compilation.  The  analyses  axe  carefully  constructed  to  avoid  unsoundness 
in  the  presence  of  threads. 

Experimental  validations  of  Cyclone  show  great  promise.  For  systems 
applications,  such  as  a  simple  web  server,  Cyclone  introduces  virtually  no 
overhead  at  all.  This  is  not  surprising,  as  these  applications  tend  to  be  I/O- 
bound.  For  scientific  applications,  a  much  larger  overhead  is  seen  (around 
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5x  for  a  naive  port,  and  3x  with  an  experienced  programmer).  Some  of  that 
overhead  is  due  to  bounds  and  null  pointer  checks  on  array  access,  which  can 
be  eliminated  using  intra-procedural  analysis;  other  overhead  arises  from  the 
use  of  “fat  pointers”  and  the  fact  that  GCC  does  not  always  optimize  struct 
manipulation. 

Secure  Program  Partitioning  (Myers).  The  Jif/split  prototypes  bring 
a  new  means  to  ensure  that  data  confidentiality  and  integrity  are  preserved 
in  distributed  systems  in  spite  of  untrusted  hosts  and  mutually  distrust¬ 
ing  principals.  This  problem  is  particularly  relevant  to  information  systems 
used  by  mutually  distrusting  organizations,  such  as  the  dynamic  coalitions 
that  arise  in  military  settings.  With  Jif/split,  programs  are  automatically 
partitioned  into  communicating  subprograms  that  run  on  the  available,  par¬ 
tially  trusted  hosts;  to  protect  data  integrity,  information  and  code  are  also 
replicated  across  the  available  hosts.  If  any  host  is  subverted,  then  only 
principals  that  have  explicitly  stated  trust  in  that  host  need  fear  a  violation 
of  confidentiality. 

Inlined  Reference  Monitors  (Schneider,  Morrisett).  In-lined  refer¬ 
ence  monitors  are  a  new  approach  to  implementing  traditional  reference 
monitors  whereby  a  desired  end-to-end  security  policy  is  formulated  using 
a  high-level  declarative  policy  language  and  then  a  rewriting  tool  is  used 
to  automatically  rewrite  untrusted  code  into  code  that  respects  the  policy. 
The  rewriting  tool  works  by  inserting  extra  state  and  dynamic  checks  into 
the  untrusted  code  so  that  the  code  becomes  self-monitoring. 

Having  developed  prototypes  for  Intel  X86  and  Java  JVM,  the  central 
question  is  one  of  practicality.  To  this  end: 

•  A  set  of  kernel  modifications  was  developed  to  support  a  prototype 
IRM  rewriter  in  Microsoft’s  Windows  operating  system. 

•  A  prototype  MSIL  (Microsoft  Intermediate  language)  IRM  realization 
has  been  developed.  It  implements  an  aspect-oriented  programming 
metaphor  for  MSIL  assembly  language  (rather  than  for  a  high-level 
language). 

Internet  Data  Management  and  Retrieval (Shanmugasundaram). 

The  QUARK  project  aims  to  integrate  the  database  and  information  re¬ 
trieval  worlds  by  building  a  next-generation  database  system  for  handling 
both  structured  and  unstructured  data.  This  has  required  the  development 
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of  new  techniques  for  storing  and  querying  semi-structured  data  (containing 
a  mix  of  structured  and  unstructured  data)  by  using  structured  relational 
database  systems.  Techniques  have  also  been  developed  for  evaluating  ex¬ 
ploratory  ranked  keyword  search  queries  over  semi-structured  data. 

The  PEPPER  project  has  two  main  goals: 

•  to  build  an  efficient  information  dissemination  (or  publish-subscribe) 
system  for  large-scale  distributed  systems  and 

•  to  develop  a  query  processing  layer  for  peer-to-peer  networks. 

The  first  goal  required  the  development  of  a  new,  scalable  index  structure, 
called  RPH-trees,  for  indexing  user  preferences  so  that  only  the  relevant 
users  are  notified  when  new  information  becomes  available.  An  interesting 
feature  of  RPH-trees  is  that  they  dynamically  adapt  to  the  information 
workload.  For  example,  if  there  is  a  sudden  burst  of  information  about 
vehicle  movement  in  Northern  Afghanistan,  the  RPH-tree  dynamically  and 
automatically  adjusts  itself  so  that  this  information  is  processed  efficiently 
and  without  delay.  For  the  second  goal,  a  fault-tolerant  and  scalable  peer-to- 
peer  index  structure  called  P-trees  has  been  developed.  P-trees  can  support 
range  queries  in  addition  to  equality  queries. 

MagnetOS  (Sirer).  MagnetOS  is  a  new  distributed  operating  system 
for  ad  hoc  networks.  It  extends  the  effective  lifetime  of  an  ad  hoc  or  sensor 
network  through  dynamic  object  migration,  providing  a  single  system  image 
of  a  unified  Java  virtual  machine  across  the  nodes  comprising  an  ad  hoc 
network.  By  automatically  and  transparently  partitioning  applications  into 
components  and  dynamically  placing  these  components  on  nodes  within  the 
ad  hoc  network,  MagnetOS  reduces  energy  consumption,  avoids  hotspots, 
and  increases  system  longevity — system  longevity  is  increased  by  a  factor  of 
four  to  five,  in  fact. 

Developing  MagnetOS  required  solving  two  significant  problems  in  ad 
hoc  networks: 

•  Multipath  route  selection.  Most  routing  algorithms — including 
those  that  are  used  in  the  core  of  the  Internet — use  single-path  routing 
and  thus  are  slow  to  respond  to  failures  and  frequently  suffer  path  fail¬ 
ures.  Consequently,  a  new,  efficient  algorithm  for  constructing  highly- 
reliable  path  sets  has  been  developed. 

•  Hybrid  Routing  Framework.  Traditional  routing  algorithms  ei¬ 
ther  proactively  disseminate  route  updates  or  defer  route  discovery 
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until  needed  by  a  client.  Choosing  between  the  two  regimes  is  difficult, 
since  the  tradeoff  changes  based  on  node  mobility  rate  and  commu¬ 
nication  patterns.  This  has  led  to  the  development  of  a  new  family 
of  routing  protocols  that  combine  proactive  and  a  reactive  routing 
algorithms.  These  new  protocols  automatically  adjust  the  radius  of 
proactive  information  dissemination  to  discover  routes  with  low  over¬ 
head  and  latency. 


3  DoD  Interactions  and  Technology  Transitions 

A  variety  of  technology  transitions  and  interactions  with  DoD  organization 
occurred  during  the  period  of  this  funding: 

•  Schneider  chaired  a  study  for  DARPA  IPTO  Program  Manager  Jay 
Lala  on  promising  research  directions  for  Self-Healing  Networked  In¬ 
formation  Systems. 

•  Schneider  chaired  the  DARPA  IPTO  Oasis  Dem-Val  External  Evalu¬ 
ation  Committee. 

•  Morrisett  and  Schneider  worked  with  Microsoft  to  develop  a  .NET 
in-lined  reference  monitor  (IRM). 

•  Researchers  at  Carnegie-Mellon  University,  Princeton  University,  Uni¬ 
versity  of  California  (Riverside),  University  of  Newcastle-Upon-Tyne, 
and  Intel  Research  are  all  now  building  on  PoET/PSLang  IRM  tools 
developed  by  Schneider  and  collaborators. 

•  Further  public  releases  of  the  Jif  compiler  have  been  made  available 
at  the  Jif  web  site,  http://www.cs.cornell.edu/jif.  The  Jif  language 
extends  the  Java  programming  language  with  support  for  information 
flow  control.  The  Jif  compiler  is  implemented  on  top  of  the  Polyglot  ex¬ 
tensible  compiler  framework  for  Java.  The  Polyglot  framework  has  also 
been  released  publicly  at  http://www.cs.cornell.edu/projects/polyglot, 
and  researchers  at  Princeton  University  are  using  this  framework  in 
their  own  research.  The  releases  of  both  Jif  and  Polyglot  are  provided 
as  Java  source  code  and  work  on  Unix  and  Windows  platforms. 

•  AT&T  research  is  collaborating  to  develop  the  Cyclone  language,  com¬ 
piler,  and  tools.  In  addition,  researchers  at  the  University  of  Maryland, 
the  University  of  Utah,  Princeton,  and  the  University  of  Pennsylvania, 
and  Cornell  are  all  using  Cyclone  to  develop  research  prototypes. 
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